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Abstract 

- -  /-  . 

In  this  paper  we  present'^a  procedure  to  measure  the  degree  of  imbalance 
of  an  unbalanced  data  set.  The  procedure  is  based  on  choosing  an  appropriate 
loglinear  model  for  the  subclass  frequencies  of  the  data.  A  measure  of  im¬ 
balance  is  then  introduced  as  some  function  of  the  chi-squared  statistic  used 
in  the  goodness-of-f it  test  for  the  loglinear  model.  The  proposed  procedure 
can  also  be  used  to  measure  departures  from  certain  types  of  balance,  such  as 
proportionality  of  subclass  frequencies,  partial  balance,  and  last-stage  uni¬ 


formity. 


Key  words:  Unbalanced  data,  nested  models,  cross-classification  models, 
loglinear  models,  chi-squared  statistic. 


1 .  Introduction 

It  is  known  that  in  a  balanced  data  situation,  parameter  estimators  and 
test  statistics  pertaining  to  the  effects  in  the  associated  model  have  certain 
optimal  properties.  These  properties,  however,  cannot  be  maintained  once  the 
data  set  becomes  unbalanced.  In  this  case,  the  statistical  properties  of  the 
aforementioned  estimators  and  test  statistics  will,  to  a  large  extent,  depend 
on  the  pattern  of  the  data  subclass  frequencies.  Severe  imbalance  in  the  data 
can  have  adverse  effects  on  the  analysis,  especially  if  that  analysis  is  an 
adaptation  of  procedures  pertaining  to  balanced  data  (see,  for  e.xample. 
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Ci"7"’ings  and  Gaylor  1974), 

Ahrens  and  Pincus  (1981)  presented  two  measures  of  irabalnnce  for  the  one¬ 
way  classification  model.  These  measures  were  utilized  to  assess  the  efficien¬ 
cy  of  an  associated  unbalanced  design  as  compared  to  a  balanced  design  with 
the  same  number  of  observations.  Other  authors  have  alluded  to  the  need  to 
measure  data  imbalance;  they  include  Hess  (197  9,  p.  646)  and  C  let j  en  (1974, 
p.  57  6). 

The  purpose  of  this  paper  is  to  present  a  general  procedure  to  measure 
imbalance  of  a  data  set  for  a  given  unbalanced  model.  It  is  shown  that  one  of 
the  two  measures  introduced  by  Ahrens  and  Pincus  (1981)  can  be  derived  as  a 
special  case  using  this  procedure.  The  proposed  procedure  can  also  be  utiliz¬ 
ed  to  measure  departures  from  certain  types  of  balance  other  than  complete 
balance  where  frequencies  are  equal  within  all  the  subclasses.  These  Include 
partial  balance,  last -stage  uniformity,  and  the  case  of  proportional  subclass 
frequencies  In  cross-classlf Icatlon  models . 

2.  A  General  Procedure  to  Measure  Imbalance 

A  measure  of  Imbalance,  denoted  by  (D) ,  Is  a  function  of  the  subclass 
frequencies  which  are  determined  by  the  design  D  used  in  the  experiment.  This 
function  takes  values  Inside  the  closed  interval  Small  values  of  i^(D) 

indicate  severe  imbalance,  whereas  "near  balance"  cases  arc  characterized  by 
large  values  of  if  (D) .  The  data  set  is  balanced  if  and  only  if  )(D)  =  L.  Fur¬ 
thermore,  this  function  must  ranain  invariant  under  any  partial  or  complete 
replication  of  the  design  (see  Ahrens  and  Pincus  1981). 

The  development  of  the  function  (>(0)  is  based  on  the  use.  of  logllnear 
models.  Several  unbalanced  models  will  be  considered  to  illustrate  the  appli¬ 
cation  of  this  procedure.  These  models  include  the  one-way  c  lass  if  act  ion 
model,  the  two-way  classi*^  icat  ion  model,  the  three-way  classification  model, 


X" 
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^  -  2  - 
=  £  (n  -njVn., 

i=l  ^ 

which  under  has  an  asymptotic  chi-s^ared  distribution  with  0  degrees  of 
freedom,  where,  in  general,  6  is  the  difference  between  Che  number  of  indepen¬ 
dent  n.'s  under  H  and  under  H  ,  respectively.  In  this  case  9  =  a-1 .  We 
1  a  o 

define  our  measure  of  imbalance  as 

*  (D)  *  ,  (2.3) 

1+c 

.  2  2  2 
where  c  =  X  /n^.  We  note  that  0  ^  (D)  ^  1  and  the  division  of  X  by  n^ 

causes  the  measure  to  be  invariant  to  any  replication  of  the  design  as  requir¬ 
ed.  Furthermore,  (D)  »  1  if  and  only  if  the  are  equal.  We  also  note 

Chat  $  (D)  is  identical  to  the  measure  v  (D)  given  by  Ahrens  and  Pincus  (1981). 


2. 2  The  Two-Way  Classification  Model 
Consider  the  model 


(2.4) 


(i  =  1,2,..., a;  j  =  l,2,,..,b;  k  •  l,2,...,n_),  where  u  is  a  fixed  unknown 


parameter;  and  6^  can  be  either  fixed  or  random.  In  this  case  the  design  D 

is  D  =  (n^^j^  ,n^2 » •  •  •  •  The  ’s  are  considered  to  have  the  multinomial 

distribution  and  each  n . .  has  the  binomial  distribution  B(n  .H  ),  where 

ij  •  •  ij 

n^^  =  £  .n..  and  II  is  the  probability  of  belonging  Co  Che  (i,j)^^  cell. 

Hence,  E(n..)  =  ra  =  n  H..'  The  corresponding  loglinear  model  Is 

IJ  ij  •  •  IJ  1-5  6 

log  =  u  +  ,  (2.5) 

where  in  this  case 


u  = 


— r  £  log  m  . . 

i.i  ^ 


a  .  =  —  i  log  m . .  -  p 
1  b  .  11 

J 


(2.6) 


B. 


—  £  log  m 


(ag)  .  .  =  log  m . . - 1  log  m  . .  -  -r  2^  log  ir. . .  +  p  . 

"  'ij  ®  ij  a  ^  ^  i:  b  ^  ®  ij 

We  noce  that  models  (2.4)  and  (2.5)  are  of  the  same  form,  except  for  the  error 

term.  The  a.’s,  B.’s  and  (a6)..'s  satisfy 
1  ’  j  ij 

Z  a  =  Z  B.  =  Z(^6)..  =  Z(aB)..  =  0. 

1  I  j  j  1  «  j  « 

Let  m..  denote  the  maximum  likelihood  estimate  of  m..  (i  =  1,2,, ...a:  i  = 
ij  ij  ’  ’  ’ 

l,2,...,b).  Under  the  hypothesis  H  =  11^11.  for  all  i  and  1  (this  is  called 

o  ij  i  j 

the  hypothesis  of  independence),  where  11 .  =  Z.II..  and  II.  -  Z.II..,  the  maximum 

ijij  jiij 

likelihood  estimates  of  II.  and  H,  are  n.  /n  and  n  ./n  ,  respectively,  where 

1  J  i*"  *J** 

n.  =  Z.n,.  and  n  .  =  Z .n . . .  Hence,  m . .  =  n .  n  ./n  .  This  is  the  case 

1*  J  iJ  ‘J  1  ij  ij  i*  *5  •• 

of  proportional  subclass  frequencies.  The  corresponding  test  statistic  is 

2  ^  2  •' 

X  =  Z  (n . m  . . )  /m  . .  , 

.  .  ij  iJ  13 

J 


whigh  under  has  an  asymptotic  chi-squared  distribution  with  9  =  (a-1) (b^l) 

2  2 

degrees  of  freedom.  If  c  =■  X  /n^^,  then 


<J>(D)  =  (2.7) 

1+c 

is  a  measure  of  departure  from  proportionality  of  the  subclass  frequencies 
with  ifi  (D)  attaining  the  value  one  when  these  frequencies  are  proportional.  In 
the  latter  case,  model  (2.5)  takes  the  additive  form 


log  m  .  =  u  +  Cl  +3 
ij  1  J 

Under  the  hypothesis  of  complete  balance,  namely,  H 

o  ij 

i  and  j,  m_  =  n^^/(ab),  and  Che  corresponding  statistic, 


(2.8) 

l/(ab)  for  ail 


.  n  . . -n  / (ab) 

x2  ^  ^ 

i.j 


n  / (ab) 


(2.9) 


is  asymptotically  distributed  as  a  chi-squared  variate  with  9  =  ab  -  1  degrees 
of  freedom.  A  measure  of  departure  from  ccrapletc  balance 


is  then  given  by 


where  c  =  X  /n  .  In  this  case  model  (2.8)  is  reduced  to  just 


log  =  u  . 


2.3  The  Three-Way  Classification  Model 


Suppose  we  consider  the  model 


^ijU  -  U  +  +  6.  +  +  (a6)y  +  (ar).^  +  (Sy).^^  +  (2.11) 

(i  =•  1.2,  —  ,a;  j  =■  1,2, ...,b;  k  *  1,2,. ..,c;  i  *  1 ,2, . .  .  ,n^j^) ,  a^,  6^,  and 
Yj^  can  be  either  fixed  or  random.  The  design  D  consists  of  the  cell  frequencies 
"ill’  "ll2’  ■'*’  "abc‘  Following  Che  approach  used  in  the  earlier  two  models, 
if  expressed  in  terms  of  the  loglinear  model 

log  -  U  +  +  Y^^  +  +  (^)ij^  +  ^^^jk  ^^^ijk  ’  (2.12) 


where 


I  5  =  Z  3.  =  1  Y,  =  KaS),.  =  r(aB)..  =  ...  =  Z(aSY),.,  =  0  . 

i  i  j  J  k  i  j  k 

From  (2.12)  several  reduced  models  may  be  considered.  These  models  are  given 
in  Table  1.  The  goodness-of-f it  of  these  models  can  be  checked  by  using 
Pearson's  approximate  chi-squared  statistic 


(2.13) 


where  m...  is  the  maximum  likelihood  estimate  of  m...  ,  or  bv  the  likelihood 
13  k  ijk’ 


ratio  statistic 


»  J  > 


(2,14) 


2  2 

(see  Agresti  1984,  p.  48).  Both  X  and  G  are  asymptotically  distributed  as 


chi-squared  variates  with  the  same  degrees  of  freedom.  The  m..  estimates  for 

ij  k 

the  models  in  Table  1  aregiven  in  the  same  table  along  with  the  corresponding 
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I 


9  2  .  , 

X*  and  G“  statistics  and  associated  degrees  of  freedan.  The  G  st?tl'=tic  has 

the  desirable  feature  of  being  monotone  increasing  as  terms  are  deleted  from 

2  2  2  2  2 

the  full  model  in  (2.12),  that  is,  0  “  ^  ^  G2  1.  *^4  —  *^5  Agresti 

2 

1934,  p.  57).  The  G  statistic  can,  therefore,  be  used  to  compare  two  nested 
models  (that  is,  one  model  is  obtained  from  the  other  by  deleting  one  or  more 
terms)  that  give  adequate  fits  to  the  cell  frequencies.  Thus,  with  the  help  of 
the  G^  statistic  it  is  possible  to  identify  one  or  mote  models  in  Table  1  that 
provide  adequate  fits.  For  such  models  departures  of  cell  frequencies  from 

their  expected  values  can  be  measured  by  means  of  the  function  (D)  in  (2,10) 

2  2 

where  c  is  given  by  the  corresponding  value  of  X  in  Table  1  divided  by 

Model  V  in  Table  1  corresponds  to  the  case  of  complete  balance,  whereas 
Model  IV  is  associated  with  the  case  of  proportional  subclass  frequencies. 

Model  III  corresponds  to  a  case  of  conditional  proportional  subclass  frequencies 
involving  values  of  i,  k  for  a  fixed  j,  and  values  of  j  and  k  for  a  fixed  i.  In 
Model  II  we  have  a  case  of  conditional  proportional  subclass  frequencies  involv¬ 
ing  only  values  of  i  and  k  for  a  fixed  j.  Model  I  is  the  full  loglinear  model. 


2.4  The  Two-Fold  Nested  Classif icat ion  Mod  el 


Let  us  now  consider  the  model 


y...  =  u  +  a.  +  3..  +  e... 
ij  k  1  ij  ij  k 


(2.15) 


(i  =  1,2, ...,a;  j  =  1,2, ...,b^;  k  =  l,2,...,n_),  denotes  the  nesting  effect 

and  3.,  denotes  the  nested  effect.  The  design  D  consists  of  Che  values  of  b,  . 
rj  °  1  ’ 

b.  ,  ...,  b  in  addition  to  the  n..  values.  In  the  complete  balance  case  b  =b 
2  a  ij  i 

for  all  i  and  n..=n  for  all  i  and  j.  A  condition  weaker  than  complete  balance 
is  last-stage  uniformity  which  requires  that  n_=n  for  all  i  and  j.  If.  how¬ 
ever,  j  ^  j'  3nd  i  =  1,2,..., a,  then  the  design  is  partially 

balanced.  It  is  known  chat  when  all  the  effects  in  (2.15)  are  random,  last-stage 
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uniformicy  is  a  sufficient  condition  for  the  sums  of  squares,  in  the  convenrinnal 
analysis  of  variance  table,  to  be  independently  distributed  as  scalad  chi- 
squared  variates  (see  Tietjen  1974,  p.  57  5),  Under  partial  balance,  however, 
the  sums  of  squares  for  the  and  3^^  effects  are  independent,  but  do  not 
have  the  scaled  chi-rsquared  distribution  (see  Cummings  1972),  It  is,  therefore, 
of  interest  to  measure  departures  from  complete  balance,  last-stage  uniformity, 
and  partial  balance. 

The  loglinear  model  corresponding  to  model  (2.15)  can  be  obtained  as  fol¬ 
lows:  letm.,  denote  the  expected  frequency  E(n..).  Then,  m,.  =  n  11,.  =  n  II.IT.i., 

ij  ^  ^  ij  ij  *  *  ij  '  •  1  J  I  1 

where  denotes  the  probability  of  belonging  to  the  i^^  level  of  the  nesting 
factor,  n  j  I  denotes  the  conditional  probability  of  belonging  to  a  level  of 

the  nested  factor  given  the  i^^  level  of  the  nesting  factor.  Hence,  log  m_  can 
be  represented  by  the  loglinear  model 


log  m..  =  u  +  ci.  +  6.,  , 

ij  1  ij 


(2.16) 


where 


b. 

a  1 


=  log  n  +  —  I  b .  log  n .  +  —  Z  Z  log  H .  I 
b  1  b  ^  ii 

•  1=1  •  1=1  j=l  ' 


b. 

1  1  1  ^ 
log  n .  +  7—  z  log  n . I  .  -  - —  z  b .  log  T , 

1  b..,  '^ji  b  .,1  1 

1  j  =  l  '  •  1=1 


—  z  z  log  n .  I  . 
b  .  1  .  ,  j  1 

•  1=1  J=1  ■'  ' 


3  .  .  =  log  IT .  I  .  -  7—  Z  log  n .  I  . 
IJ  ^  J  i  1  j=l  J I  ^ 


In  the  partial  balance  case,  n.|.  =  1/b.  for  all  i  and  i,  hence,  the  maxi- 

J  I  1  1 

mum  likelihood  estimate  of  m . ,  is  m . .  =  n.  /b.,  where  n  =Z^, n  (i=12..a) 

ij  ij  1-  i’  1-  j=l  ij  >  >  ■  y  >, 

since  in  this  case  =  n^^/n^^.  A  measure  of  partial  balance  is  then  given  by 


(2.17) 


r-‘.' 


(>(D)  = 


where 


2  2 
c  =  X^/n 


-  (n.  .-n.  /b.)' 

=  I  j-- 

.  .  n.  /b. 

i*  i 


(2.18) 


Under  partial  balance  X  has  asymptotically  the  chi-squared  distribution  with 
b  -a  degrees  of  freedom,  where  b  =  £.  ,b,-  This  follows  from  the  fact  Chat 
in  the  general  case,  Che  number  of  linearly  independjnt  's  is  b^-1  whereas 
under  partial  balance  this  number  is  just  a-1.  We  note  chat  this  case  can  be 


represented  by  the  loglinear  model 


log  m^^  =  U  +  . 


(2.19) 


Under  last-stage  uniformity,  n  =  1/^.  ail  i  and  j.  The  loglinear 


model  in  this  case  has  Che  form 


log  =*  U 


(2.20) 


The  maximum  likelihood  estimate  of  m^^  is  given  by  n^^  =  Hence,  a 

measure  of  departure  from  last-stage  uniformity  is  given  by  (2.17),  where 

2  ^2, 
c  »  X  /n 


(n..-n  )‘ 

11 


a.. 


(2.21) 


which  has  the  as>’mptotic  chi-squared  distribution  with  b^-1  degrees  of  freedom 

Unlike  the  former  two  cases,  departure  from  complete  balance  can  be  atcri 

buted  to  variation  in  the  values  of  b,  ,  b_ ,  ...,  b  ,  or  to  variation  in  the  n 

l  Z  a  1 

values.  We  thus  need  to  measure  imbalance  with  regard  to  the  b.'s  and  also 

1 

with  regard  to  the  n  's.  We  shall  consider  Chat  Che  b  's  form  a  multinomial 

ij  1 

distribution  independently  of  the  multinomial  distribution  of  the  n  's  with 
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being  distributed  as  a  binomial  B(b^,T^).  Hence, d^  =  E(b^)  =  (i  =  1»2, 

. . .  ,a) .  A  measure  of  imbalance  concerning  the  b^'s  is,  therefore,  given  by 


<!>,  (D)  =  - 2 

li-ci 


(2.22) 


where 


2  2 
c^  =  X^/b. 


2  ^ 

X  =  E  - - - 

i=l  b 


(2.23) 


where  b^  =  b^/a.  This  statistic  has  the  asymptotic  chi-squared  distribution 

with  a-1  degrees  of  freedom  when  *  1/a  (i  =  1,2 . a).  On  the  other  hand, 

a  measure  of  imbalance  concerning  the  n., ’s  is 

ij 


4.  (D)  =  - ^  , 

1+C2 


(2.24) 


where 


2  „2, 

<^2  “  ^2^"- 


(n..-n  )' 

«  •  * 


i,j  n 


(2.25) 


The  statistic  X^  is  the  same  as  the  one  used  in  last-stage  uniformity.  Since 

the  multinomial  distribution  of  the  ^j^’s  is  independent  of  the  multinomial  dis- 

2  2  2  2 
tribition  of  the  n_'s,  is  statistically  independent  of  X^,  hence  X^  +  X^ 

is  asymptotically  distributed  as  a  chi-squared  variate  with  b  +a-2  degrees  of 

freedom.  Now,  to  measure  departure  from  conplete  balance  we  use  the  measure 


i>(D)  = 


(2.26) 


where 


2.5  The  Three-Fold  Nested  Classif icatlon  Model 


In  this  section  we  consider  the  model 


J-liU  ‘  I-  +  Sy  +  ''ijk'"  'ljU  ' 


(2.28) 


(i  =  1,2,. . .  ,a;  j  =  1,2, .. .  ,b^;  k  =  ,2, . . .  ,c^^  ;  I  =  1,2, . .  .  .n^^j^)  .  The  val¬ 
ues  of  b.,  c^.,  and  n  make  up  the  design  D.  Here  different  types  of  balance 
1  id  Uk 

can  be  considered;  each  is  a  stronger  type  of  balance  than  the  one  preceding 


i)  Last-stage  partial  balance,  that  is,  partial  balance  with  respect  to  the 
n^jj^  values.  There  are  two  kinds  of  such  partial  balance;  in  the  first 
kind  depends  on  i  and  j  only,  and  In  Che  second  kind  depends 

on  i  only. 

ii)  Last-stage  uniformity  when  the  n^^j^'s  are  equal  for  all  values  of  i, j  , 
and  k. 

iii)  Last-stage  uniformity  and  next-to-last-stage  partial  balance,  that  is, 
when  c^j  depends  on  i  only. 

iv)  Last-stage  uniformity  as  well  as  next-to-last-stage  uniformity,  that 
is,  when  the  c^^  's  are  equal  for  all  values  of  i  and  j  . 

v)  Complete  balance.  This  occurs  when  equality  of  frequencies  occurs  with¬ 
in  all  Che  subclasses. 

Each  type  can  be  characterized  by  one  or  more  loglinear  models  and  a  cor¬ 
responding  measure  of  Imbalance  can  be  obtained  accordingly.  For  example,  for 
Type  (iii),  if  Che  n^^j^'s  are  considered  to  have  a  multinomial  distribution 

with  n,.,  being  distributed  as  a  binomial  B(n  where  n  =  Z  n,.,. 

^ » J  » 
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noce  the  probability  of  belonging  to  level  i  of  A,  level  j  of  B  nested  within 


i,  and  level  k  of  C.  As  before,  the  n  's  are  considered  to  have  a  multino- 

ij  ic 

mial  distribution  with  ”  ^•••^ijk' 

For  this  model  we  can  have  four  types  of  balance: 

i)  Proportional  subclass  frequencies  involving  the  AB  subclasses  and  the 

levels  of  factor  C,  that  is,  II  *  H.II  .H  . 

ij  k  1  j  1 1  k 

ii)  Partial  balan;e  with  respect  to  the  n..,  values,  that  is,  H. ■  H  /(b.c). 

i  j  k  ij  k  i  i 


iii)  Last-stage  uniformity,  that  Is,  H...  “  l/(b^c)  for  all  i,  j,  and  k. 

iv)  Complete  balance,  that  is,  =  1/a  and  “  l/(b^c),  where  is  the 

i^^  binomial  probability  associated  with  the  multinomial  distribution 
of  the  b. 's. 

Bach  of  the  above  four  types  can  be  represented  by  a  loglinear  model. 

These  models  are  given  in  Table  2.  Furthermore,  for  each  of  these  four  types 

2 

a  measure  of  Imbalance  is  obtained  by  using  formula  (2.3).  The  value  of  c  in 
this  formula  and  the  degrees  of  freedom  for  the  corresponding  asymptotic  chi- 
squared  statistics  are  also  given  in  Table  2. 


3 .  Numerical  Examples 

i)  Cummings  and  Gaylor  (1974)  used  several  designs  to  illustrate  the  combin¬ 
ed  effects  of  dependence  and  nonchi-squaredness  of  the  analysis  of  variance 
mean  squares  on  the  size  of  Satterthwaite 's  approximate  F-test  for  variance 
component  testing  in  a  two-fold  nested  model.  We  shall  consider  three  of 
these  designs  which  are  described  in  Table  3  and  arc  also  represented  graphi¬ 
cally  in  F  igu  re  1 . 

For  each  of  the  three  designs  we  measure  departures  from  partial  balance 


2 

and  last-stage  uniformity  by  using  formula  (2.17)  with  X  being  given  by  (2.18) 
for  partial  balance  and  by  (2.25)  for  last-stage  uniformity.  We  also  measure 
departure  from  complete  balance  by  applying  formulas  (2,26)  and  (2.27).  The 
results  are  given  in  Table  4. 

Table  3 

Designs  For  a  Two-Fold  Nested  Model 


Table  4 

Values  of  (D)  For  The  Three  Designs  In  Table  3 


Design 

Partial  Balance 

Last-Stage  Uniformity 

Complete  Balance 

1 

1 

.735 

.58 

2 

.69 

.69 

.69 

3 

.73 

.61 

.54 

From  Table  4  we  note  that  other  than  partial  balance  for  Design  1,  none  of  the 
designs  has  strong  balance  properties.  Of  all  three  designs,  Design  3  is  the 
most  unbalanced  with  respect  to  last-stage  uniformity  and  complete  balance. 


ii)  Bliss  (1967  ,  p.  355)  described  a  nested  experiment  involving  three  factors 
with  an  associated  model  of  the  form  given  by  (2.28).  In  this  experiment,  a=ll 


where  c  =»  c  /b  ,  hence,  !t>(D)  =  .83.  We  note  that  this  is  equal  to  the  previous 
measure  value  for  Type  (iii)  since  both  c.  and  b.  in  formula  (2.30)  do  not  de- 

J  1*1 

pend  on  i.  thus,  c,  /b.  =  c  /b  =  c  .  We  also  note  that  since  the  b.'s  are 

^  ^'t*!*****  i 

equal,  the  value  <{>  (D)  =  .83  is  also  a  measure  of  departure  from  complete  balance, 
which  is  Type  (v)  balance. 


A.  Concluding  Remarks 


We  have  introduced  a  procedure  for  measuring  the  degree  of  imbalance  that 
is  associated  with  an  unbalanced  model.  The  procedure  applies  to  cross  classi¬ 
fication  models,  nested  classification  models,  and  to  models  with  a  mixture  of 
cross-classified  and  nested  effects.  It  can  also  be  used  to  measure  departures 
from  different  types  of  balance,  especially  in  nested  models  where  Imbalance 
can  affect  various  stages  of  the  nested  design.  Several  examples  of  unbalanced 
models  were  studied.  From  these  examples  it  is  easy  to  see  that  this  procedure 
is  general  enough  to  apply  to  any  unbalanced  model. 

With  the  help  of  this  procedure  it  is  now  possible  to  describe  in  a  quan¬ 
titative  manner  different  kinds  of  imbalance,  such  as  extreme  imbalance, 
moderate  imbalance,  and  near  balance.  This  can  serve  as  an  indicator  of  the 
suitability  of  the  approximate  methods  that  are  adapted  from  balanced-data-based 
procedures  and  used  to  analyze  an  unbalanced  model,  particularly,  when  the 
appropriate  measure  value  is  near  unity.  It  is  to  be  cautioned,  however,  that 
low  values  of  that  measure  do  not  necessarily  mean  that  such  approximate 
methods  are  inadequate.  Cummings  and  Gaylor  (1974),  for  example,  noted  that 
for  some  extremely  unbalanced  design,  namely.  Design  3  in  Table  3,  their  approx¬ 
imate  F-test  performed  very  well.  They  attributed  this  behavior  to  counter¬ 
balancing  effects  which  appear  to  reduce,  rather  than  compound,  the  effect  of 
imbalance  on  the  standard  analysis  of  variance. 
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